Kart: a divide-and-conquer algorithm for NGS read alignment
نویسندگان
چکیده
Motivation Next-generation sequencing (NGS) provides a great opportunity to investigate genome-wide variation at nucleotide resolution. Due to the huge amount of data, NGS applications require very fast and accurate alignment algorithms. Most existing algorithms for read mapping basically adopt seed-and-extend strategy, which is sequential in nature and takes much longer time on longer reads. Results We develop a divide-and-conquer algorithm, called Kart, which can process long reads as fast as short reads by dividing a read into small fragments that can be aligned independently. Our experiment result indicates that the average size of fragments requiring the more time-consuming gapped alignment is around 20 bp regardless of the original read length. Furthermore, it can tolerate much higher error rates. The experiments show that Kart spends much less time on longer reads than other aligners and still produce reliable alignments even when the error rate is as high as 15%. Availability and Implementation Kart is available at https://github.com/hsinnan75/Kart/ . Contact [email protected]. Supplementary information Supplementary data are available at Bioinformatics online.
منابع مشابه
DCA: an efficient implementation of the divide-and-conquer approach to simultaneous multiple sequence alignment
MOTIVATION DCA is a new computer program for multiple sequence alignment which utilizes a 'divide-and-conquer' type of heuristic approach. AVAILABILITY The algorithm is freely available from http://bibiserv.TechFak.Uni-Bielefeld.DE/dca/.
متن کاملFree Vibration Analysis of Repetitive Structures using Decomposition, and Divide-Conquer Methods
This paper consists of three sections. In the first section an efficient method is used for decomposition of the canonical matrices associated with repetitive structures. to this end, cylindrical coordinate system, as well as a special numbering scheme were employed. In the second section, divide and conquer method have been used for eigensolution of these structures, where the matrices are in ...
متن کاملAlgorithms for Sequence Alignment
Sequence alignment is an important tool for describing relationships between sequences. Many sequence alignment algorithms exist, differing in efficiency, and in their models of the sequences and of the relationship between sequences. The focus of this thesis is on algorithms for the optimal alignment of two or three sequences of biological data, particularly DNA sequences. The algorithms are d...
متن کاملMultiple DNA Sequence Alignment Based on Genetic Algorithms and Divide-and-Conquer Techniques
Multiple DNA sequence alignment is one of the important research topics of bioinformatics. Because of the huge length of DNA sequences of advanced organisms, some researchers used divide-and-conquer techniques to cut the sequences for decreasing the space complexity for sequence alignment. Because the cutting points of sequences of the existing methods are fixed at the middle or the near-middle...
متن کاملDivide-and-conquer multiple alignment with segment-based constraints
A large number of methods for multiple sequence alignment are currently available. Recent benchmarking tests demonstrated that strengths and drawbacks of these methods differ substantially. Global strategies can be outperformed by approaches based on local similarities and vice versa, depending on the characteristics of the input sequences. In recent years, mixed approaches that include both gl...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 33 شماره
صفحات -
تاریخ انتشار 2017